首页> 外文OA文献 >Unexpected observations after mapping LongSAGE tags to the human genome.
【2h】

Unexpected observations after mapping LongSAGE tags to the human genome.

机译:将LongSAGE标签映射到人类基因组后,出现意想不到的发现。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. RESULTS: Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts. CONCLUSION: We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross-validation of the corresponding tags using other methods.
机译:背景:SAGE已被广泛用于研究已知转录本的表达,但很少用于注释新的转录区域。 LongSAGE产生的标签足够长,可以可靠地映射到全基因组序列。在这里,我们使用此属性来研究从所有公共图书馆获得的人类LongSAGE标签的位置。我们主要关注未映射到已知成绩单的标签。结果:使用SAGE库中已发布的错误率,我们首先删除了可能由测序错误导致的标签。然后,我们观察到出乎意料的大量剩余标签仍与基因组序列不匹配。其中一些对应于人类mRNA的一部分,例如polyA尾巴,两个外显子之间的连接以及转录本的多态性区域。另一个不可忽略的比例可以归因于鼠本的污染和残留的测序错误。在使用这些屏幕过滤掉我们的数据以确保我们的数据集高度可靠之后,我们研究了一次映射到基因组的标签。这些标签中有31%与未注释的笔录相对应。其他的则映射到已知的转录区域,但是其中许多(将近一半)位于这些已知转录本的反义序列或新变体中。结论:我们对所有公开可用的人类LongSAGE标签进行了全面研究,并仔细验证了这些数据的可靠性。我们发现了许多与人类基因组序列不匹配的标签的潜在来源。其余标签的属性暗示测序错误的级别可能已被低估。与基因组序列匹配但不在注释的外显子中匹配的标签的频率表明,人类转录组比当前人类基因组注释所显示的复杂得多,具有许多新的剪接变体和反义转录本。 SAGE数据适合将新的转录本定位到基因组,如使用其他方法对相应标签的高交叉验证率所证明的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号